How to use Stata efficiently
Programming is a precise language. Unlike natural language, syntax or spelling errors will not be tolerated by the computer. For this reason, its important to have good habits for programming so that you 1) avoid mistakes and 2) identify them easier.
1. How to start your dofile
Always begin you dofile in the same way (just copy and paste from old dofiles): 1) some basic information, 2) packages necessary to run the code, 3) preliminary settings, and 4) import the file structure. Example:
/*******************************************************************************
********************************************************************************
Lab 1 for Econometrics class
Written by Eirik Berger
PhD Research Scholar
The Norwegian School of Economics (NHH)
********************************************************************************
*******************************************************************************/
* Extra packages used (uncheck if not already installed)
* ssc install estout
* Preliminary settings
clear all
set matsize 1600
set scheme s1color
set more off
********************************************************************************
********************************************************************************
* Top folder for the project
global top "/Users/eirikberger/Dropbox/0_AKADEMISK/department_work/ECN402_spring_2020/lab0"
* Other globals: Based on global wd
global data "$top/data"
global dofiles "$top/dofiles"
global figures "$top/figures"
global tables "$top/tables"
cd "$top"
/*******************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
Question 1
Part a
********************************************************************************
********************************************************************************
********************************************************************************
*******************************************************************************/
2. Load and check your data
Always add “clear” as an option when you open a .dta file. Also, don’t include the full path: It is cleaner to either use globals (see below) or use the “cd” (change directory) command.
use "$data/BWGHT.DTA", clear
list faminc cigtax cigprice in 1/10, noobs
| faminc cigtax cigprice |
|----------------------------|
| 13.5 16.5 122.3 |
| 7.5 16.5 122.3 |
| .5 16.5 122.3 |
| 15.5 16.5 122.3 |
| 27.5 16.5 122.3 |
|----------------------------|
| 7.5 16.5 122.3 |
| 65 16.5 122.3 |
| 27.5 16.5 122.3 |
| 27.5 16.5 122.3 |
| 37.5 16.5 122.3 |
+----------------------------+
sum faminc
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
faminc | 1,388 29.02666 18.73928 .5 65
Or simply use the “browse” command (“br” for short) to look at your dataset directly.
3. How to re-use results (using locals)
Sometimes we want to save certain numbers coming from Stata commands for later use. Example: What is the share of white mothers who smoke? First, we find the number of white mothers.
count if white==1
1,089
Then the number of white mothers who smoke.
count if white==1 & cigs>0
165
I copy the answers from the above commands and calculate the answer using the “display” command. Note that the display command functions as a calculator when you don’t use “”.
display 165/1089
.15151515
However, we can do this without any manual copy-and-paste work. Using the “return list” command after running the “count” command, we find that the count command returns (saves) the scalar “r(N)”.
count if white==1
1,089
return list
scalars:
r(N) = 1089
Similarly for the “sum” command:
sum cigs
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
cigs | 1,388 2.087176 5.972688 0 50
return list
scalars:
r(N) = 1388
r(sum_w) = 1388
r(mean) = 2.087175792507205
r(Var) = 35.67300052567169
r(sd) = 5.972687881152981
r(min) = 0
r(max) = 50
r(sum) = 2897
We use the “local” command to save the mean (r(mean)) as “groupmean”. You can check the content of “groupmean” by using the display command.
local groupmean = r(mean)
display "`groupmean'"
2.087175792507205
Put it all together and we can estimate the share of smokers among white mother in one go:
count if white==1
local white = r(N)
display "Saved number is: " `white'
count if white==1 & cigs>0
local whiteandsmoke = r(N)
display "Saved number is :"`whiteandsmoke'
display "The fraction of white people that smoke is: " `whiteandsmoke'/`white'
display "or in percent " 100*(`whiteandsmoke'/`white') "%"
1,089
Saved number is: 1089
165
Saved number is :165
The fraction of white people that smoke is: .15151515
or in percent 15.151515%
Important: When using locals you must either 1) run the entire do-file in one go or 2) run the creation of the local (like in “local whiteandsmoke = r(N)”) together with the line of code where you use the number saved in the local (like “display `white’”). You do the second option by selecting all lines of code from the “local …” to the row where you use the number. This is because locals are short lived, and deletes itself when stata is done running a bunch of code.
4. Using global to deal with folder structures
Globals are a great way of keeping your dofile clear and the file structure organized. With globals you save a string (text) to a name, like at the beginning of my dofile. Use $[name] to retrieve it. In the following example, I save the path to my working directory and then change the directory:
global wd "/Users/eirikberger/Dropbox/0_AKADEMISK/department_work/ECN402_spring_2020/lab0"
cd "$wd"
Similarly, to open a dataset in the data folder (already saved as a global), you write:
use "$data/BWGHT.DTA", clear
6. Logs
You have to hand in the log from running your full dofile (as an appendix in your main pdf file). It might be easiest to add the commands below when you create your dofile, but to make them inactive (use “*" in front of them) until you are done and run the full dofile in one go.
capture log close // Could add this to your preliminaries. Closes the log file given that a log files is recording
log using introlab, text replace // Opens a log and saves it as a text file.
capture log close // close and saves the log file given that a log file is recording
exit // Useful closing commands
Note that you might find the “capture” command useful at some point, especially when your dofile has to be able to deal with several different datasets. It executes the command following it (same line), given that there are no errors. If there are any errors in the following command, it stops the process and continues with the next line. Examples, where this might be useful, is to drop a certain variable given that it exists. For example, if you might have a bwght variable and an income variable, but it is not certain:
drop bwght
This command works because there is indeed a variable named bwght. What about income:
drop income
We get an error because there is no such variable. To run the code without any error, we run:
capture drop income
7. Use Google and the “help” command
Just do it. Ask for help if you can’t find it there.
8. Delimit
In cases where one command goes across several lines (for example when you are producing figures), you can use #delimit ; and #delimit cr. Example
#delimit ;
esttab lin_all lin_high using A1_tab1.tex,
b(%4.2f) se(%4.2f) r2(%4.2f) scalars(Prediction) replace label star(* 0.10 ** 0.05 *** 0.01)
mtitles("All countries" "High-income countries")
title("Linear regression model: Outcome is labour productivity in agriculture");
#delimit cr
Remember to add the “;” at the end of the last line in the command.
9. Logical statements
Remember the logical statments:
- & and
- | or
- != not equal to
- == equal to
- > larger than
- < smaller than
- >= larger or equal to
- <= smaller than or equal to
From last lecture
Vincent asked me to go through how you run an f-test in Stata. I think this is an excellent opportunity to learn how to figure out these kinds of questions by yourself.
1. T-test
If you search for “t-test” on google, you will find this page quite soon (the stata help manual for the command “ttest”): https://www.stata.com/manuals13/rttest.pdf. It gives you all the information you need about the syntax of the command + good examples of how to use the command.
2. F-test
How should we learn to use the f-test? You guessed it: Just use the same trick as for the t-test and find this help page (the stata help manual for the command “test”): https://www.stata.com/manuals13/rtest.pdf
reg faminc cigtax cigprice bwght male white
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(5, 1382) = 40.07
Model | 61667.9131 5 12333.5826 Prob > F = 0.0000
Residual | 425392.101 1,382 307.809045 R-squared = 0.1266
-------------+---------------------------------- Adj R-squared = 0.1235
Total | 487060.014 1,387 351.160789 Root MSE = 17.544
------------------------------------------------------------------------------
faminc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigtax | -.7546118 .1254443 -6.02 0.000 -1.000694 -.50853
cigprice | .6144594 .0953262 6.45 0.000 .4274597 .8014591
bwght | .0674765 .0234134 2.88 0.004 .0215469 .113406
male | -1.750652 .9463898 -1.85 0.065 -3.607168 .1058639
white | 13.45813 1.163566 11.57 0.000 11.17558 15.74067
_cons | -54.0982 10.69192 -5.06 0.000 -75.07234 -33.12405
------------------------------------------------------------------------------
To use the f-test to test the joint null hypothesis that the coefficients on both “cigtax” and “cigprice” are equal to zero, we use the “test” command. It’s really intuitive:
test (cigtax = 0) (cigprice = 0)
( 1) cigtax = 0
( 2) cigprice = 0
F( 2, 1382) = 21.08
Prob > F = 0.0000
Or with even less code:
test cigtax cigprice
( 1) cigtax = 0
( 2) cigprice = 0
F( 2, 1382) = 21.08
Prob > F = 0.0000
What about testing if they are equal to each other?
test cigtax = cigprice
( 1) cigtax - cigprice = 0
F( 1, 1382) = 40.99
Prob > F = 0.0000
Assignment 1
You should always attempt to fully automate the production of your output, as this will save you a lot of time if you have to do lots of revisions. Note: With LaTex (Word for academia), you can go even further in making your workflow efficient. You might be okay with less efficiency in this course, but it could save you days of work when writing an empirical master and PhD thesis.
1. Produce and export regression results
Based on the hint dofile of assignment 1:
* Install estout (place "*" in front of ssc... once installed)
ssc install estout
Run 2 regression and save the estimated coefficients.
reg bwght cigs
estimates store reg1
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 32.24
Model | 13060.4194 1 13060.4194 Prob > F = 0.0000
Residual | 561551.3 1,386 405.159668 R-squared = 0.0227
-------------+---------------------------------- Adj R-squared = 0.0220
Total | 574611.72 1,387 414.283864 Root MSE = 20.129
------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.5137721 .0904909 -5.68 0.000 -.6912861 -.3362581
_cons | 119.7719 .5723407 209.27 0.000 118.6492 120.8946
------------------------------------------------------------------------------
reg bwght cigs white
estimates store reg2
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(2, 1385) = 27.47
Model | 21925.2578 2 10962.6289 Prob > F = 0.0000
Residual | 552686.462 1,385 399.051597 R-squared = 0.0382
-------------+---------------------------------- Adj R-squared = 0.0368
Total | 574611.72 1,387 414.283864 Root MSE = 19.976
------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.5059517 .0898216 -5.63 0.000 -.6821527 -.3297507
white | 6.148295 1.304469 4.71 0.000 3.589346 8.707244
_cons | 114.9317 1.173547 97.94 0.000 112.6296 117.2339
------------------------------------------------------------------------------
Make a table with results from reg1 and reg2. Lots of opportunities to make the regression table nicer!
esttab reg1 reg2, title("Joint regression table:")
Joint regression table:
--------------------------------------------
(1) (2)
bwght bwght
--------------------------------------------
cigs -0.514*** -0.506***
(-5.68) (-5.63)
white 6.148***
(4.71)
_cons 119.8*** 114.9***
(209.27) (97.94)
--------------------------------------------
N 1388 1388
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Make the same table and save it in the file called regtable1. This file can later be used in a word document (rtf format is directly readable by word). Note: You can also export to .tex format using the same command but replacing .rtf with .tex.
Use “help esttab” or google if you want to customize your regression results. For example, you can use “estadd scalar” to add a number to your table.
2. Produce and export figures
graph twoway scatter LP_agric gdpcapita, name(LP, replace)
graph export "$figures/graph_1.png", replace
graph twoway scatter share_empl_agric gdpcapita, name(empshare, replace)
graph export "$figures/graph_2.png", replace
graph combine LP empshare, cols(2)
graph export "$figures/scatterGDP.png", replace
5. Comment a lot!
Use “*" to write comments in your dofile. This symbol can also be used to “turn off” a command by placing “*" in front of the command to save it for later. Use “//” if you want to comment on the same line as a command (see example below). You can also create a full section with text only (parts that are not interpreted as code), by writing “/*" to start the section and “*/" to end it (see point no. 1 for an example).
It’s challenging to interpret code for future you or co-authors. Note: You have to hand in your dofile for assignments (as an appendix in your main pdf file). The dofile should 1) reproduce your results, 2) it should be clear what you have done (use comments) and 3) it should look tidy and nice.